279 research outputs found

    Cross-Lingual Dependency Parsing for Closely Related Languages - Helsinki's Submission to VarDial 2017

    Full text link
    This paper describes the submission from the University of Helsinki to the shared task on cross-lingual dependency parsing at VarDial 2017. We present work on annotation projection and treebank translation that gave good results for all three target languages in the test set. In particular, Slovak seems to work well with information coming from the Czech treebank, which is in line with related work. The attachment scores for cross-lingual models even surpass the fully supervised models trained on the target language treebank. Croatian is the most difficult language in the test set and the improvements over the baseline are rather modest. Norwegian works best with information coming from Swedish whereas Danish contributes surprisingly little

    From open parallel corpora to public translation tools : The success story of OPUS

    Get PDF
    This paper describes the success of OPUS, starting from a small side-project but leading to a full-fledged ecosystem for training and deploying open machine translation systems. We briefly present the current state of the framework focusing on the mission of increasing language coverage and translation quality in public translation models and tools that can easily be integrated in end-user applications and professional workflows. OPUS now provides the biggest hub of freely available parallel data and thousands of open translation models have been released supporting hundreds of languages in various combinations.Non peer reviewe

    Multilingual NMT with a language-independent attention bridge

    Full text link
    In this paper, we propose a multilingual encoder-decoder architecture capable of obtaining multilingual sentence representations by means of incorporating an intermediate {\em attention bridge} that is shared across all languages. That is, we train the model with language-specific encoders and decoders that are connected via self-attention with a shared layer that we call attention bridge. This layer exploits the semantics from each language for performing translation and develops into a language-independent meaning representation that can efficiently be used for transfer learning. We present a new framework for the efficient development of multilingual NMT using this model and scheduled training. We have tested the approach in a systematic way with a multi-parallel data set. We show that the model achieves substantial improvements over strong bilingual models and that it also works well for zero-shot translation, which demonstrates its ability of abstraction and transfer learning

    A Multi-task Learning Approach to Text Simplification

    Get PDF
    We propose a multi-task learning approach to reducing text complexity which combines text summarization and simplification methods. For the purposes of this research, two datasets were used: the Simple English Wikipedia dataset for simplification and the CNN/DailyMail dataset for summarization. We describe several experiments with reducing text complexity. One experiment consists in first training the model on summarization data, then fine-tuning it on simplification data. Another experiment involves training the model on both datasets simultaneously while augmenting source texts with a task-specific tag that shows the model which task (summarization or simplification) needs to be performed on a given text. Models with a similar architecture were also trained on each dataset separately for comparison. Our experiments have shown that the multi-task learning approach with task-specific tags is more effective than the fine-tuning approach, and the models trained for both tasks simultaneously can perform as good at each of them as the models that were trained only for that specific task.Peer reviewe

    Neural Machine Translation with Extended Context

    Get PDF
    Peer reviewe

    Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks

    Get PDF
    In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones. We test this hypotheses by measuring the perplexity of such models when applied to paraphrases of the source language. The intuition is that an encoder produces better representations if a decoder is capable of recognizing synonymous sentences in the same language even though the model is never trained for that task. In our setup, we add 16 different auxiliary languages to a bidirectional bilingual baseline model (English-French) and test it with in-domain and out-of-domain paraphrases in English. The results show that the perplexity is significantly reduced in each of the cases, indicating that meaning can be grounded in translation. This is further supported by a study on paraphrase generation that we also include at the end of the paper.Peer reviewe
    corecore